Optimal Partitioning of Data Chunks in Deduplication Systems

نویسندگان

  • Michael Hirsch
  • Ariel Ish-Shalom
  • Shmuel Tomi Klein
چکیده

Deduplication is a special case of data compression in which repeated chunks of data are stored only once. For very large chunks, this process may be applied even if the chunks are similar and not necessarily identical, and then the encoding of duplicate data consists of a sequence of pointers to matching parts. However, not all the pointers are worth being kept, as they incur some storage overhead. A linear, suboptimal solution of this partition problem is presented, followed by an optimal solution with cubic time complexity and requiring quadratic space.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Similarity Based Deduplication with Small Data Chunks

Large backup and restore systems may have a petabyte or more data in their repository. Such systems are often compressed by means of deduplication techniques, that partition the input text into chunks and store recurring chunks only once. One of the approaches is to use hashing methods to store fingerprints for each data chunk, detecting identical chunks with very low probability for collisions...

متن کامل

A Cost-efficient Rewriting Scheme to Improve Restore Performance in Deduplication Systems

In chunk-based deduplication systems, logically consecutive chunks are physically scattered in different containers after deduplication, which results in the serious fragmentation problem. The fragmentation significantly reduces the restore performance due to reading the scattered chunks from different containers. Existing work aims to rewrite the fragmented duplicate chunks into new containers...

متن کامل

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information

In deduplication-based backup systems, the chunks of each backup are physically scattered after deduplication, which causes a challenging fragmentation problem. The fragmentation decreases restore performance, and results in invalid chunks becoming physically scattered in different containers after users delete backups. Existing solutions attempt to rewrite duplicate but fragmented chunks to im...

متن کامل

Survey on Data Deduplication for Cloud Storage to Reduce Fragmentation

Data Deduplication is an important technique which provides better result to store more information with less space. Cost and maintenance of Information backup storage system for major enterprises can be minimized by storing it on Cloud Storage. Data redundancy between different kinds of data storage gets minimal by utilizing data deduplication method. By giving each application differently and...

متن کامل

Cryptographic Hashing Method using for Secure and Similarity Detection in Distributed Cloud Data

Received Jun 29, 2017 Revised Nov 23, 2017 Accepted Dec 17, 2017 The explosive increase of data brings new challenges to the data storage and supervision in cloud settings. These data typically have to be processed in an appropriate fashion in the cloud. Thus, any improved latency may origin animmense loss to the enterprises. Duplication detection plays a very main role in data management. Data...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Discrete Applied Mathematics

دوره 212  شماره 

صفحات  -

تاریخ انتشار 2013